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Content-based image retrieval (CBIR) uses the content features for 
retrieving and searching the images in a given large database. Earlier, 
different hand feature descriptor designs are researched based on cues that 
are visual such as shape, colour, and texture used to represent these images. 
Although, deep learning technologies have widely been applied as an 


alternative to designing engineering that is dominant for over a decade. The 


features are automatically learnt through the data. This research work 
proposes integrated dual deep convolutional neural network (IDD-CNN), 
IDD-CNN comprises two distinctive CNN, first CNN exploits the features 
: and further custom CNN is designed for exploiting the custom features. 
Convolutional neural network Moreover, a novel directed graph is designed that comprises the two blocks 
Image retrieval i.e. learning block and memory block which helps in finding the similarity 
Images among images; since this research considers the large dataset, an optimal 
Integrated dual deep-CNN strategy is introduced for compact features. Moreover, IDD-CNN is 
evaluated considering the two distinctive benchmark datasets the oxford 
dataset considering mean average precision (mAP) metrics and comparative 
analysis shows IDD-CNN outperforms the other existing model. 
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1. INTRODUCTION 

In the last two decades, there has been enormous development in technologies, which led to the huge 
growth in internet usage, smartphone and digital cameras. Moreover, this phenomenon increases storing and 
sharing of multimedia data such as images or videos. Image is one of the complex forms of data; hence, 
searching for the relevant image from an archive is considered one of the challenging tasks [1]. In such an 
approach, the user submits the query by entering some keywords or text that matches with text or keywords 
in the archive. However, this process also retrieves the images, which are not relevant to the query [2], [3]. 
Content-based image retrieval (CBIR) is one of the tasks that are designed by defining the problem of 
searching the image based on semantic matching given a large dataset. 

CBIR approach aims at searching images based on their visual content. The query image is given 
and the aim is to find the image that contains a similar scene or object, which might be captured under 
various conditions [3]-[5]. Category level CBIR aims at finding the same image class as a defined query. 
Figure | shows the general process of deep learning backed CBIR [6]. Six blocks are present in Figure 1, but 
only two of them-the content information and the query images-are retrieved using a deep learning-based 
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methodology. While one deep learning approach block processes the query image, another deep learning 
approach block extracts the feature. Additionally, notable characteristics are chosen from the retrieved 
feature. Additionally, these features are contrasted during feature matching, and the results are produced as 


metrics [7]-[9]. 
———— ae. 
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Figure 1. Content-based image retrieval using the deep learning approach 


Recently, this problem has been resolved by the use of convolutional neural networks (CNN), this 
method also provides a higher rate of accuracy. The success of CNN has attained a load of attention towards 
the technologies relating to neural networks considering tasks of image classification. The resulting success 
is complete because of the enormous datasets annotated such as ImageNet. The training of data is an 
expensive process with manual annotation is more prone to errors. The trained network for the classification 
of images has good abilities for adaptation. The particular use of CNN activations was utilized for training 
the classification tasks of image descriptors that are off-shelf as well as being adapted for various tasks has 
resulted in success. Specifically considering image retrieval, different approaches have directly utilized the 
network activations as features of images as well as performed the image searching successfully [8]. 
Powerful, major features have been successfully learnt using deep learning. Although, some major challenges 
arise conceming: i) semantic gap reduction, ii) improvising scalability of retrieval, and iii) balance of 
efficiency as well as the accuracy of retrieval. 

An approach of order less fusion of multilayers (MOF) is proposed [10] that has been inspired by an 
order less pooling of multilayers (MOP) [11] utilized for the retrieval of images. Although, the local features 
have no discrete role in the differentiation of features that are subtle due to the local as well as global features 
being treated as identical. Zhang ef al. [12], a solution is introduced that emphasizes varying multiple 
instances of the graph (VMIG) for which a constant semantic space is studied to save its query semantics that 
are diverse. The retrieving task has been formulated with different instances of studying the problems for 
connecting the diverse features to the modalities. Particularly, a vibrational auto encoder that is query guided 
is used for modelling the constant semantic space rather than studying the single point of embedding. 
Deoxyribonucleic acid (DNA) is utilized in the CBIR methodology that is proposed where the images are 
initially stored in sequences of DNA after which the amino acid that is corresponding to it is extracted; this is 
utilized as feature vectors. Here, the dimensionality reduction for feature vectors is achieved as well as the 
required information is preserved [13]-[17]. Nayakwadi and Fatima [18], an image retrieval system that is 
supervised weakly termed a class agnostic method is proposed based on CNN. The images in the database 
have been pre-processed to split the background from the foreground and these foregrounds are stored as 
clusters. Wang ef al. [19], Gao et al. [20], and Zhu et al. [21], the focus of this paper is to decrease the 
calculation on the count of data, which is performed on the online stage, and avoid any mismatches that occur 
by mixing of backgrounds [22], [23]. 

In the last few years, the complexity of multimedia content, especially the images, has grown 
exponentially, and on daily basis, more than millions of images are uploaded to different archives such as 
Twitter, Facebook, and Instagram. Search for a relevant image from an archive is a challenging research 
problem for the computer vision research community. Most search engines retrieve images based on 
traditional text-based approaches that rely on captions and metadata. In the last two decades, extensive 
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research is reported on CBIR, image classification, and analysis. In CBIR and image classification-based 
models, high-level image visuals are represented in the form of feature vectors that consists of numerical 
values. The research shows that there is a significant gap between image feature representation and human 
visual understanding. This research work develops a dual-deep CNN for content-based image retrieval; 
moreover, dual-deep CNN integrates two CNN. First CNN extracts the deep features, and this is given to the 
second CNN that holds the characteristics of exploiting the semantic features concerning the dataset. 
Furthermore, the novel directed graph is designed with two distinctive nodes known as memory node and 
learning block node; in the case of a large dataset, a novel strategy is introduced for generating the efficient 
feature. IDD-CNN is evaluated considering the two distinctive oxford and Paris dataset considering the 
different metrics like mean average precision (mAP) and mean absolute error (MAE). 

This research work is organized as shown in: section | starts with the background of image retrieval 
and the importance of CBIR with deep learning. Further, a few existing models are discussed along with their 
shortcomings. This section ends with research motivation and contribution. Section 2 discusses the proposed 
architecture of IDD-CNN along with algorithm and mathematical modelling. Section 3 evaluates the IDD- 
CNN based on the various difficulty level. 


2. PROPOSED METHOD 

Content-based image retrieval (CBIR) is a widely used technique for retrieving images from huge 
and unlabeled image databases. Hence, the rapid access to these huge collections of images and the retrieving 
of a similar image of a given image (Query) from this large collection of images presents major challenges 
and requires efficient techniques. The performance of a content-based image retrieval system crucially 
depends on the feature representation and similarity measurement. Figure 2 shows the proposed model 
workflow; input image is given to the first deep-CNN to exploit the features and generated feature is given to 
the second deep custom CNN that helps in exploiting more features. Moreover, a novel directed graph is 
designed along with two distinctive blocks i.e. memory block and learning block. 


Second Custom- 


Deep CNN Feature updation 


Input image 


similarity 


Similarity Matching 


Learning Blocks Memory Block 


Figure 1. Proposed workflow 


Considering the M parameter in F dimensional node features denoted as Z belongs to T™**; forward 
propagation of the designed custom convolutional layer is computed as (1). 


jaro = o(F-1? CF-/2}™y@)) (1) 


According to (1), J™ indicates the designed output layer of custom-CNN given as Z as the input which can 
be given as J“ = Z. Furthermore, C indicates an adjacent matrix for the given graph-structured data. In 
general, the adjacency matrix with the self-connection is defined through C = C + Kp with Ky as the identity 
matrix. F indicates the diagonal degree matrix along with its elementD,; = i) C,). Further, Y™) indicates the 
weight matrix in FCN-layer; also o indicates the activation function that indicates non-linearity in given 
convolutional layers. Moreover, (1) can be further fragmented into the (2). 
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According to (2) forms the similar forward propagation as described in the earlier equation of FC-layer; 
furthermore, nth layer output is recognized through a given weight matrix also known as the normalized 
adjacency matrix R. J indicates the node feature vector that can be updated with the adjacency node feature 
along with the matrix weight. 


Bm = RBM (3) 
R indicates normalized adjacency matrix and given in (4). 
R= F-l/? CF-1/2 (4) 


Moreover, considering the previous work on manifold mapping, this research paper develops an 
integrated custom-CNN which tends to learn the novel feature representation and features are updated 
through the corresponding neighboring node in a given database. Here Algorithm 1 shows the architecture of 
IDD-CNN algorithm. 


Algorithm 1. IDD-CNN algorithm 

Input as memory block with its size o, learning block M, novel directed graph i and 
training image set K, 

The expected output is learning block C and network W 


Stepl: Parameter Initialization 

Step2: The exploitation of feature generation 

Step3: Initiating the learning block HS (EOE ee through random process 
mechanism 

Step4: Designing of a memory block E, = [E4”, E%,........,Eo-? 

Step5: While iterisless than MIN do 

Step6: Considering the mini-batch as the input along with CNN backbone as the output 

Step7: Designing novel directed graph c= [cc |... CO] 

Steps: Considering each directed graph, other CNN outputs the updated feature 
representations. 

Step9: Network updation with backpropagation with designed objective 

Stepl0: Learning block updation with algorithm considering the feature representation 

Stepll: Updation of memory block with optimized feature representation 

Step12: End while loop 

Step13: Return optimal network and learning blocks 


Algorithm 1 shows the IDD-CNN algorithm where input is taken as the memory block-learning 
block, designed directed graph along with training set. Further expected output includes the learning block 
along with trained network. At first, the parameter is initialized, and features are exploited considering the 
designed CNN. Further, we initiate the learning block and memory block is designed, later novel directed 
graph is constructed and another CNN is utilized for the updated feature representation. Moreover, this 
process is iterative approach hence other parameter are updated and learning blocks along with the trained 
network are observed. Once the model is designed, it needs to be evaluated considering the benchmark 
dataset. Evaluation is carried out in next section. 


3. PERFORMANCE EVALUATION 

Deep learning architecture like CNN has emerged as one of the major alternatives for hand-designed 
features in the last decade; architecture like CNN automatically learns the feature from data. This research work 
exploits and develops yet another CNN architecture to extract the optimal features. This section of the research 
evaluates the IDD-CNN model considering the benchmark dataset of Oxford [24]. IDD-CNN is evaluated 
considering the image retrieval and metrics evaluation. Also, comparison is carried out considering the ResNet 
and VggNet model along with its various model including collaborative approach as discussed in [25]. 


3.1. Dataset details and system configuration 

IDD-CNN is designed using python as a programming language along with various deep learning 
libraries. System configuration includes 2 TB of the hard disk along with 4 GB CUDA enabled Nvidia 
graphics. IDD-CNN is evaluated considering the standard dataset of oxford5k. Moreover, this research 
utilizes the ROxfordSK, which comprises 4,993 images along with 70 query images. Figure 3 shows the 
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sample image of the Oxford dataset and Paris dataset; the first row shows the five samples of Oxford dataset 
and the second row shows the five sample images of Oxford dataset. 


Figure 3. Sample of Roxford and Rparis 


3.2. Image retrieval and re-ranking 

IDD-CNN performs the image retrieval based on the given query. This section evaluates the image 
retrieval and reranking is carried out based on the relevance. Figure 4 shows the query image and Figure 5 
shows the top 10 images ranked according to the relevance. 


Figure 4. Query image 


Figure 5. Top 10 images ranked according to the relevance 


3.3. Metrics evaluation 
The mAP or sometimes simply just referred to as AP is a popular metric used to measure the 
performance of models doing document/information retrieval and object detection tasks. 
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mAP = Sey average_precision(s)) (5) 


As shown in (5), S indicates the defined set for queries and average_precision(s) indicates the 
average precision for given query s. MAP metrics is one of the important metrics where for given query 
average precision is computed; later mean of all these average precision gives the single number known as 
mAP that shows the performance of the model at the given query. 


3.4. Comparison with ResNet based architecture 

Figure 6 shows the comparison of various ResNet architecture with proposed architecture IDD-CNN 
considering the difficulty level medium. In here, R-Gem achieves 64.7, its query expansion version R-Gem+ 
QE achieves 67.2. Further R-Gem along with deep spatial matching (DSM) approach R_Gem+DSM achieves 
65.3. Similarly, R-Gem+DFS achieves mAP of 69.8; furthermore, existing model collaborative approach 
observes 66. However, in comparison IDD-CNN achieves 74.37. Figure 7 shows the comparison of various 
ResNet architecture with proposed architecture IDD-CNN considering the difficulty level hard. Figure 2 
shows the comparison of various ResNet architecture with proposed architecture IDD-CNN considering the 
difficulty level medium. Here, R-Gem achieves 38.5, and its query expansion version R-Gem+QE achieves 
40.8. Further R-Gem along with the DSM approach R_Gem+DSM achieves 39.2. Similarly, R-Gem+DFS 
achieves mAP of 40.2; furthermore, the existing model collaborative approach observes 42.5. However, in 
comparison IDD-CNN achieves 57.7. 
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Figure 6. ResNet architecture comparison with the Figure 7. ResNet architecture comparison with the 
proposed model on medium difficulty level proposed model on hard difficulty level 


3.5. Comparison with VGGNet based architecture 

Figure 8 shows the comparison of various ResNet architecture with proposed architecture IDD-CNN 
considering the difficulty level medium. Further, Figure 4 shows the comparison of various VGGNET 
architectures with the proposed architecture IDD-CNN considering the difficulty level medium. Here, V- 
Gem achieves 38.5, and its query expansion version R-Gem+QE achieves 40.8. Further R-Gem along with 
the DSM approach R_Gem+DSM achieves 39.2. Similarly, R-Gem+DFS achieves mAP of 40.2; 
furthermore, the existing model collaborative approach observes 42.5. However, in comparison IDD-CNN 
achieves 57. 

Figure 9 shows the comparison of various ResNet architecture with proposed architecture IDD-CNN 
considering the difficulty level hard comparison of various VGGNET architecture with proposed architecture 
IDD-CNN considering the difficulty level medium. Here, V-Gem achieves 38.5, and its query expansion 
version R-Gem+QE achieves 40.8. Further R-Gem along with the DSM approach R_Gem+DSM achieves 
39.2. Similarly, R-Gem+DFS achieves mAP of 40.2; furthermore, the existing model collaborative approach 
observes 42.5. However, in comparison IDD-CNN achieves 57.7. 


3.6. Comparative analysis 

In the earlier section, Figure 2 and Figure 3 shows the comparison on various ResNet architecture 
including the existing collaborative model. Moreover, comparative analysis suggests that IDD-CNN 
improvises the model with 6.54% from R-Gem+DFS. Considering medium level and considering difficulty 
level as hard, IDD-CNN achieves marginal improvisation of 35.76%. Similarly, considering the VGGNet 
model, IDD-CNN achieves 4.59% of improvisation in comparison with V-Gem+DFS+Collaborative. 
Moreover, considering the difficulty level as hard, IDD-CNN achieves 24.08% improvisation than the 
existing model. 
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Figure 8. VggNet architecture comparison with the Figure 9. VggNet architecture comparison with the 
proposed model on medium difficulty level proposed model on hard difficulty level 


4. CONCLUSION 

CBIR is one of the critical task and it has emerged yet another popular task in image processing due 
to enormous growth in multimedia like image. This research work designs and develops an efficient retrieval 
and ranking mechanism named integrated dual deep convolutional neural network (IDD-CNN). IDD-CNN is 
evaluated on image retrieval and metrics evaluation; at first is image retrieval and re-ranking based on 
defined query. Later, IDD-CNN is evaluated considering mAP metrics; further evaluation is carried out by 
comparing IDD-CNN with various CNN model and its variant architecture. IDD-CNN is proven efficient and 
highly improvised, thus future scope lies in reducing the image retrieval time and varying the different 
dataset. 
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